Goto

Collaborating Authors

 gibbs update


A hybrid quantum-classical approach for inference on restricted Boltzmann machines

Kālis, Mārtiņš, Locāns, Andris, Šikovs, Rolands, Naseri, Hassan, Ambainis, Andris

arXiv.org Artificial Intelligence

Boltzmann machine is a powerful machine learning model with many real-world applications, for example by constructing deep belief networks. Statistical inference on a Boltzmann machine can be carried out by sampling from its posterior distribution. However, uniform sampling from such a model is not trivial due to an extremely multi-modal distribution. Quantum computers have the promise of solving some non-trivial problems in an efficient manner. We explored the application of a D-Wave quantum annealer to generate samples from a restricted Boltzmann machine. The samples are further improved by Markov chains in a hybrid quantum-classical setup. We demonstrated that quantum annealer samples can improve the performance of Gibbs sampling compared to random initialization. The hybrid setup is considerably more efficient than a pure classical sampling. We also investigated the impact of annealing parameters (temperature) to improve the quality of samples. By increasing the amount of classical processing (Gibbs updates) the benefit of quantum annealing vanishes, which may be justified by the limited performance of today's quantum computers compared to classical.


Faster MCMC for Gaussian Latent Position Network Models

Spencer, Neil A., Junker, Brian, Sweet, Tracy M.

arXiv.org Machine Learning

Latent position network models are a versatile tool in network science; applications include clustering entities, controlling for causal confounders, and defining priors over unobserved graphs. Estimating each node's latent position is typically framed as a Bayesian inference problem, with Metropolis within Gibbs being the most popular tool for approximating the posterior distribution. However, it is well-known that Metropolis within Gibbs is inefficient for large networks; the acceptance ratios are expensive to compute, and the resultant posterior draws are highly correlated. In this article, we propose an alternative Markov chain Monte Carlo strategy---defined using a combination of split Hamiltonian Monte Carlo and Firefly Monte Carlo---that leverages the posterior distribution's functional form for more efficient posterior computation. We demonstrate that these strategies outperform Metropolis within Gibbs and other algorithms on synthetic networks, as well as on real information-sharing networks of teachers and staff in a school district.


Particle-Gibbs Sampling For Bayesian Feature Allocation Models

Bouchard-Côté, Alexandre, Roth, Andrew

arXiv.org Machine Learning

Bayesian feature allocation models are a popular tool for modelling data with a combinatorial latent structure. Exact inference in these models is generally intractable and so practitioners typically apply Markov Chain Monte Carlo (MCMC) methods for posterior inference. The most widely used MCMC strategies rely on an element wise Gibbs update of the feature allocation matrix. These element wise updates can be inefficient as features are typically strongly correlated. To overcome this problem we have developed a Gibbs sampler that can update an entire row of the feature allocation matrix in a single move. However, this sampler is impractical for models with a large number of features as the computational complexity scales exponentially in the number of features. We develop a Particle Gibbs sampler that targets the same distribution as the row wise Gibbs updates, but has computational complexity that only grows linearly in the number of features. We compare the performance of our proposed methods to the standard Gibbs sampler using synthetic data from a range of feature allocation models. Our results suggest that row wise updates using the PG methodology can significantly improve the performance of samplers for feature allocation models.


Benchmarking Quantum Hardware for Training of Fully Visible Boltzmann Machines

Korenkevych, Dmytro, Xue, Yanbo, Bian, Zhengbing, Chudak, Fabian, Macready, William G., Rolfe, Jason, Andriyash, Evgeny

arXiv.org Machine Learning

Quantum annealing (QA) is a hardware-based heuristic optimization and sampling method applicable to discrete undirected graphical models. While similar to simulated annealing, QA relies on quantum, rather than thermal, effects to explore complex search spaces. For many classes of problems, QA is known to offer computational advantages over simulated annealing. Here we report on the ability of recent QA hardware to accelerate training of fully visible Boltzmann machines. We characterize the sampling distribution of QA hardware, and show that in many cases, the quantum distributions differ significantly from classical Boltzmann distributions. In spite of this difference, training (which seeks to match data and model statistics) using standard classical gradient updates is still effective. We investigate the use of QA for seeding Markov chains as an alternative to contrastive divergence (CD) and persistent contrastive divergence (PCD). Using $k=50$ Gibbs steps, we show that for problems with high-energy barriers between modes, QA-based seeds can improve upon chains with CD and PCD initializations. For these hard problems, QA gradient estimates are more accurate, and allow for faster learning. Furthermore, and interestingly, even the case of raw QA samples (that is, $k=0$) achieved similar improvements. We argue that this relates to the fact that we are training a quantum rather than classical Boltzmann distribution in this case. The learned parameters give rise to hardware QA distributions closely approximating classical Boltzmann distributions that are hard to train with CD/PCD.


A nonparametric HMM for genetic imputation and coalescent inference

Elliott, Lloyd T., Teh, Yee Whye

arXiv.org Machine Learning

Genetic sequence data are well described by hidden Markov models (HMMs) in which latent states correspond to clusters of similar mutation patterns. Theory from statistical genetics suggests that these HMMs are nonhomogeneous (their transition probabilities vary along the chromosome) and have large support for self transitions. We develop a new nonparametric model of genetic sequence data, based on the hierarchical Dirichlet process, which supports these self transitions and nonhomogeneity. Our model provides a parameterization of the genetic process that is more parsimonious than other more general nonparametric models which have previously been applied to population genetics. We provide truncation-free MCMC inference for our model using a new auxiliary sampling scheme for Bayesian nonparametric HMMs. In a series of experiments on male X chromosome data from the Thousand Genomes Project and also on data simulated from a population bottleneck we show the benefits of our model over the popular finite model fastPHASE, which can itself be seen as a parametric truncation of our model. We find that the number of HMM states found by our model is correlated with the time to the most recent common ancestor in population bottlenecks. This work demonstrates the flexibility of Bayesian nonparametrics applied to large and complex genetic data.


Learning in Markov Random Fields using Tempered Transitions

Salakhutdinov, Ruslan R.

Neural Information Processing Systems

Markov random fields (MRFs), or undirected graphical models, provide a powerful framework for modeling complex dependencies among random variables. Maximum likelihood learning in MRFs is hard due to the presence of the global normalizing constant. In this paper we consider a class of stochastic approximation algorithms of Robbins-Monro type that uses Markov chain Monte Carlo to do approximate maximum likelihood learning. We show that using MCMC operators based on tempered transitions enables the stochastic approximation algorithm to better explore highly multimodal distributions, which considerably improves parameter estimates in large densely-connected MRFs. Our results on MNIST and NORB datasets demonstrate that we can successfully learn good generative models of high-dimensional, richly structured data and perform well on digit and object recognition tasks.


On Input Selection with Reversible Jump Markov Chain Monte Carlo Sampling

Sykacek, Peter

Neural Information Processing Systems

In this paper we will treat input selection for a radial basis function (RBF) like classifier within a Bayesian framework. We approximate the a-posteriori distribution over both model coefficients and input subsets by samples drawn with Gibbs updates and reversible jump moves. Using some public datasets, we compare the classification accuracy of the method with a conventional ARD scheme. These datasets are also used to infer the a-posteriori probabilities of different inputsubsets.


On Input Selection with Reversible Jump Markov Chain Monte Carlo Sampling

Sykacek, Peter

Neural Information Processing Systems

In this paper we will treat input selection for a radial basis function (RBF) like classifier within a Bayesian framework. We approximate the a-posteriori distribution over both model coefficients and input subsets by samples drawn with Gibbs updates and reversible jump moves. Using some public datasets, we compare the classification accuracy of the method with a conventional ARD scheme. These datasets are also used to infer the a-posteriori probabilities of different input subsets. 1 Introduction Methods that aim to determine relevance of inputs have always interested researchers in various communities. Classical feature subset selection techniques, as reviewed in [1], use search algorithms and evaluation criteria to determine one optimal subset.


On Input Selection with Reversible Jump Markov Chain Monte Carlo Sampling

Sykacek, Peter

Neural Information Processing Systems

In this paper we will treat input selection for a radial basis function (RBF) like classifier within a Bayesian framework. We approximate the a-posteriori distribution over both model coefficients and input subsets by samples drawn with Gibbs updates and reversible jump moves. Using some public datasets, we compare the classification accuracy of the method with a conventional ARD scheme. These datasets are also used to infer the a-posteriori probabilities of different input subsets. 1 Introduction Methods that aim to determine relevance of inputs have always interested researchers in various communities. Classical feature subset selection techniques, as reviewed in [1], use search algorithms and evaluation criteria to determine one optimal subset.